Skip to content

Squeeze morph: Adding UCs tests #182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 19, 2025
Merged

Conversation

Luiskitsu
Copy link
Contributor

@Luiskitsu Luiskitsu commented Apr 11, 2025

I have written different UCs tests from issue 181 for the squeeze morph. I am keeping the morphed x-axis the same as the unmorphed as we discussed. I have left the morph function empty. My inputs are a uniform target, and a squeezed morph, then the expected morph is the unsqueezed morph.

Copy link
Contributor

@sbillinge sbillinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beautifully done....a thing of beauty!!!

please see my inline comments


Configuration Variables
-----------------------
squeeze
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More recently we have been using this syntax:

squeeze : list
  The polynomial coefficients [a0, a1, ..., an] for the squeeze function where the polynomial would be
  of the form a0 + a1 x + a2 x ^2 and so on.  The order of the polynomial is determined by the length
  of the list.

I removed the "array-like". Do we want to allow that or we just want a list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just have a list here, I think makes sense and is consistent.


This applies a polynomial to squeeze the morph non-linearly. The morphed
data is returned on the same grid as the unmorphed data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may be nice to have a usage example in the docstring. Basically copy-paste the code from the test to show how the morph can be instantiated and then used.

We normally put docstrings in the methods and in the constructor (the def __init__():). This class is inheriting the constructor from the base class, so I am not 100% sure how the docstring propagates through in the documentation, but without that, but I think if we put a docstring under the def morph(): which is overloading that method from the base class, it should become clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I was following the architecture of other morphs. They have a docstring bellow the class which is the main description and then another docstring under the function which are few words. Should I keep it consistent?
When I add an example in the docstring do I add it as code using >>>?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a group standard for this, but more recently I think we are leaning towards having the class-level docstring to be just a few words about the high level intent of the class and the docstring of the constructor to be have more detail. I did look into this question at one point and formed an opinion, but I don't fully remember what I found. It could have been when I was working on my beloved "DiffractionObject"s in diffpy.utils but I am not sure. We could look there. All documentation is better than lack of documentation, so the standards are just for us to figure out the most effective (user-useful) way to write the docs so that any time we spend on it has the most impact.

# UCs from issue 181: https://github.com/diffpy/diffpy.morph/issues/181
# UC2: Same grid
(np.linspace(0, 10, 101), np.linspace(0, 10, 101)),
# UC4: Target extends beyond morph
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add "same grid density" in the comment? likewise below on all the comments.

Do we have any where the grid density varies? I think we want tests for where the target is finer and when the morph is finer. We probably don't need to do the full matrix of longer/shorter/finer/coarser I think, so just for the finer/coarser tests they can be arrays over the same range.

We can discuss what we want the code to do in each of those cases. If one is numerically unstable or leads to inaccuracies we can choose to raise an exception and not allow users to do it as our desired behavior, but we should decide what we want to happen and have a test for that.

If we do raise an exception it goes in a separate test that we often give a name like test_function_bad() to differentiate it from test_function() and we check that if a condition is met (something coarser and we don't like that) then the right exception (and helpful error message) is raised.

We like error messages that are of the form "such and such went wrong. Try doing so and so to fix it", as this is most helpful to the user in general.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have those cases covered. We have UC6 where the target is finer grid density and then UC8 where the morph is finer grid density. I'll update the comments in the test definitions to make this clearer, including specifying the grid density differences explicitly.

I think it's fine for the morph and target to have different grid densities in these tests, since we're not performing operations that compare the two arrays. That will become critical during refinement, where the grid densities will affect comparison between morph and target?.

@Luiskitsu
Copy link
Contributor Author

I have edited the comments and docstrings as you mentioned. The test covers same grids as well as different grids for target vs morph inputs.
When we are happy about the tests I will write the function code.
For the tests, how do we choose the tolerance given to interpolation/extrapolation inaccuracies? For scattering UCs the squeezing parameters will be very small and therefore the tolerance can be very small. But for other user cases the squeezing parameters can be much larger. I guess we can also think it other way around, how do we choose the squeezing parameters for the tests? So far I am using values that are much larger than what we would expect in diffraction experiments.

Copy link
Contributor

@sbillinge sbillinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lovely! One last comment on the docstring, but let's push on and implement the functionality!

>>> from numpy.polynomial import Polynomial
>>> from diffpy.morph.morphs.morphsqueeze import MorphSqueeze

>>> x_target = np.linspace(0, 10, 101)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this block of text because it will confuse people. This is us building a target, but the assumption is that the user already has a target, so none of this code is needed by people just to use the morph.

>>> poly = Polynomial(squeeze_coeff)
>>> y_morph = np.sin(x_morph + poly(x_morph))

>>> morph = MorphSqueeze()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this block is what we are after. We could add a few words about what the variables mean. Maybe also add a couple of lines that show other attributes that the user can access from the morph instance. For example things like
x_morph_in = morph.x_morph_in etc. No need to be exhaustive, just give a few examples to help users.

btw, this looks great in general. I am not sure if we have a "group standard" for these code blocks. I think that it is possible to write htese actually so that they can be run by some automated testing codes but we don't really do that. I just feel that, as a user, if there is a few lines of code showing how to use this, it can be super helpful.

@@ -25,17 +25,17 @@
]
morph_target_grids = [
# UCs from issue 181: https://github.com/diffpy/diffpy.morph/issues/181
# UC2: Same grid
# UC2: Same range and same grid density
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now fantastic! I love these tests. Future us will love you.


This applies a polynomial to squeeze the morph non-linearly. The morphed
data is returned on the same grid as the unmorphed data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a group standard for this, but more recently I think we are leaning towards having the class-level docstring to be just a few words about the high level intent of the class and the docstring of the constructor to be have more detail. I did look into this question at one point and formed an opinion, but I don't fully remember what I found. It could have been when I was working on my beloved "DiffractionObject"s in diffpy.utils but I am not sure. We could look there. All documentation is better than lack of documentation, so the standards are just for us to figure out the most effective (user-useful) way to write the docs so that any time we spend on it has the most impact.

@Luiskitsu
Copy link
Contributor Author

I have moved the main docstring below the function definition and the class-level docstring is the short description. I have also added comments on the example.
When you push I will work on the functionality. For the tests, do we want to add a tolerance to account for the interpolation/extrapolation inaccuracy?

@Luiskitsu
Copy link
Contributor Author

Also, I just realized that some of my input 3rd order degree coefficients are too large and the x-array is not strictly increasing and loops back. When we work on the functionality, an idea is that we could add a warning statement when the x-array doesn't increase due to high values of high order polynomials and just return the input data and tell the user to change the initial values for the coefficients and run it again

@sbillinge
Copy link
Contributor

I have edited the comments and docstrings as you mentioned. The test covers same grids as well as different grids for target vs morph inputs. When we are happy about the tests I will write the function code. For the tests, how do we choose the tolerance given to interpolation/extrapolation inaccuracies? For scattering UCs the squeezing parameters will be very small and therefore the tolerance can be very small. But for other user cases the squeezing parameters can be much larger. I guess we can also think it other way around, how do we choose the squeezing parameters for the tests? So far I am using values that are much larger than what we would expect in diffraction experiments.

I think we are ready to write the code now. Regarding tolerances, the kind of behavior I would want is for the tolerances to be very tight on non-extrapolated regions and I could tolerate a slightly larger tolerance on extrapolated data (That is tagged as such as we discussed). It is probably a mistake to allow loose tolerances on all points to all tests to pass for a few points at the ends. For these unit tests, I think it just computes a squeezed function so I think it is not really extrapolating anything, just computing it over a wider range. maybe use 10-6 or sthg?

@Luiskitsu
Copy link
Contributor Author

Luiskitsu commented Apr 14, 2025

I think we are ready to write the code now. Regarding tolerances, the kind of behavior I would want is for the tolerances to be very tight on non-extrapolated regions and I could tolerate a slightly larger tolerance on extrapolated data (That is tagged as such as we discussed). It is probably a mistake to allow loose tolerances on all points to all tests to pass for a few points at the ends. For these unit tests, I think it just computes a squeezed function so I think it is not really extrapolating anything, just computing it over a wider range. maybe use 10-6 or sthg?

Sounds great! Should we push this first or should I keep working on the same PR?
I can make the non-extrapolated regions have very tight tolerances. For these tests we are extrapolating for certain scenarios, when we squeeze unsqueezed data. When we unsqueeze squeezed data we don't extrapolate.
This is a visual example on one test showing when we would need to extrapolate:
Figure 495
What I can do is what we mentioned where I also return the index_min and index_max where there is no extrapolation and compare that to the expected with a tight tolerance and then we can compare the extrapolated region with a larger tolerance.

@sbillinge
Copy link
Contributor

This is great. Honestly, I am not sure if we need this for these tests because no regression is going on. We are just making sure that a known function is correctly computed. Let's put tight tolerances on the tests and then just see if they pass when we write the code. If not, we can go back. Since we decided we would capture the indices (and values?) of the points where the extrapolation occurs, we should write quite tests for those too. What shall we save to those attributes when there is no extrapolation? I think None would be best, rather than the value of the end of the array, because it will be easy to check later if an extrapolation happened or didn't.

tl;dr: I like this approach but not sure if it is needed, so let's keep this in our back pocket in case we need it. But let's write tests for the morph low and high extrapolation indices/values.

@sbillinge
Copy link
Contributor

btw, we should keep going on this PR.

@Luiskitsu
Copy link
Contributor Author

Our tests do involve extrapolation. The morph function uses y_morph_actual = CubicSpline(x_squeezed, y_morph)(x_morph), which extrapolates when x_morph is outside x_squeezed. While our expected result is y_morph_expected = np.sin(x_morph). Right now I am returning the indices before and after interpolation and comparing the interpolated region with a tight tolerance (1e-6). I can change the expected morph to match the spline exactly, but that would just be verifying the same code.
So far 47 tests are passing and one test is failing, I think is due to larger interpolation error in one scenario.
I've also adjusted the 3rd order coefficient (from 0.001 to 0.0001) to avoid any axis flipping which should not happen in any UC. Then I can add an assert for each extrapolated region but with larger tolerances than 1e-6. In some cases I expect the extrapolated error to be large (>1).

Copy link
Contributor

@sbillinge sbillinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nearly there. Few comments

Returns
-------
A tuple (x_morph_out, y_morph_out, x_target_out, y_target_out,
min_index, max_index) where the target values remain the same and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to return min and max index. These can be just updated in place. Also, these names are not so intuitive to me. maybe something like extrap_index_low and extrap_index_high? Let's define these and set them to None when MorphSqueeze is instantiated.

)
left_extrap = np.where(self.x_morph_in < x_squeezed[0])[0]
right_extrap = np.where(self.x_morph_in > x_squeezed[-1])[0]
min_index = left_extrap[-1] if left_extrap.size else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.extrap_index_low (or left) and don't return it.

Describe this variable in the constructor where it is first defined.

right_extrap = np.where(self.x_morph_in > x_squeezed[-1])[0]
min_index = left_extrap[-1] if left_extrap.size else None
max_index = right_extrap[0] if right_extrap.size else None
return self.xyallout + (min_index, max_index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apart from it being a bad idea in general, this breaks backwards compatibility.

low_extrap_idx,
high_extrap_idx,
) = morph(x_morph, y_morph, x_target, y_target)
if low_extrap_idx is None and high_extrap_idx is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems too complicated to me, and also does it test the extrapolation? How about something like

if low_extrap_idx is None: 
    low_extrap_idx = 0
elif high_extrap_idx is None:
    high_extrap_idx = -1
assert np.allclose(y_morph_actual[low_extrap_index:high_extrap_idx], y_morph_expected[low_extrap_index:high_extrap_idx], atol=1e-6)
assert np.allclose(y_morph_actual[:low_extrap_index-1], y_morph_expected[:low_extrap_index-1], atol=1e-5)
assert np.allclose(y_morph_actual[high_extrap_index+1:], y_morph_expected[high_extrap_index+1:], atol=1e-5)
assert morph.low_extrap_idx == expected_low_extrap_idx
assert morph.high_extrap_idx == expected_high_extrap_idx

@Luiskitsu
Copy link
Contributor Author

Hi Simon, thank you very much for the feedback! I have updated the test and the function accordingly. There are some tests that are failing and I am pretty sure is because there are cases where the extrapolation is large and inaccurate.

@sbillinge
Copy link
Contributor

Hi Simon, thank you very much for the feedback! I have updated the test and the function accordingly. There are some tests that are failing and I am pretty sure is because there are cases where the extrapolation is large and inaccurate.

that's interesting in its own right. What do you think is a good behavior? One thing we can do in principle is to allow a user to specify a tolerance and alert them when an extrapolation exceeds that tolerance. We could then have a default tolerance for the interpolation and for the extrapolation.

That could actually be a nice "scientifically rigorous" solution. However, it is making things more complicated. Maybe a suggestion would be to

  1. decide tolerances we are comfortable with
  2. write tests that pass with those tolerances (so reduce the size of the parameters)
  3. write an issue for a future "feature" that implements the idea above and leave it for, well, the future.....

@Luiskitsu
Copy link
Contributor Author

I like your idea where the user can specify tolerance and we can warn them if it exceeds that tolerance for extrapolated values.
For the current tests I think having a tight tolerance of 1e-6 for interpolated values and a more loose tolerance of 1e-3 makes sense. I have made some of the high order coefficients smaller so that the tests pass. We could also write the tolerance as a percentage from the target data rather than just the subtraction.
I will add an issue for adding the input tolerance and tolerance warning

Copy link

codecov bot commented Apr 19, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.99%. Comparing base (d1c2695) to head (fcee820).
Report is 10 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #182      +/-   ##
==========================================
+ Coverage   96.85%   96.99%   +0.13%     
==========================================
  Files          18       19       +1     
  Lines         795      831      +36     
==========================================
+ Hits          770      806      +36     
  Misses         25       25              
Files with missing lines Coverage Δ
tests/test_morphsqueeze.py 100.00% <100.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sbillinge sbillinge merged commit 4914488 into diffpy:main Apr 19, 2025
5 checks passed
@sbillinge
Copy link
Contributor

super!!!!! got it merged. That was a huge job but we learned a lot about our morph functions so it was time well spent.

We should decide what is next. We could

  1. generalize the scale and other morphs to have similar behavior
  2. make a function morph that allows a user to specify a python function and the morph applies that to the y and/or x axes
  3. implement the variable tolerance feature
  4. something else

We started this so we could do the thing of morphing a low-Qmax F(Q) onto a high Qmax one, and I think that we want (2) to do that, then for the actual UC for the function to be the function in PDFgetX3 that applies the corrections to get the F(Q).....

Thoughts?

@Luiskitsu
Copy link
Contributor Author

Luiskitsu commented Apr 19, 2025

Thanks again for all your mentoring!
I’m a bit confused regarding how we've been handling different Q-grids and Q-ranges in our tests for MorphSqueeze. I've been testing diffpy.morph.morph_api for stretching, and it looks like it can already handle different Q-grids and Q-ranges as inputs.
Looking at the tests in test_morphstretch.py there are no tests for cases with differing Q-ranges or Q-grids. Could it be that the handling of different x-axis has already been taken care of by morphrgrid.py?
We've been focusing on testing MorphSqueeze, but it seems like we may not have fully checked how the Q-range and Q-grid handling is implemented in the full morph chain function. We might need to revisit our code so that it does not do the same thing twice? I think is fine if we keep it as it is, it won't affect negatively but I think the data is regridded before being refined.

@Luiskitsu
Copy link
Contributor Author

Luiskitsu commented Apr 19, 2025

Regarding the next steps, I think we should develop both a "multiplicative" morph and an "additive" morph to convert from I(q) to F(q) given an F(q) target (for our XFEL UC).
Lets say we have I(q) data from XFEL, and we want to morph it to match the F(q) target from synchrotron data.
The user should be able to specify the polynomial order for the additive component and a polynomial or another function (such as multiple gaussians) for the multiplicative component.
With this functionality, the morphing process will handle the entire conversion from I(q) to F(q) by fitting both the additive and multiplicative components. The result will be the morphed F(q) on the same grid as the input I(q).
For the additive component, I think using a polynomial makes the most sense as it is flexible. For the multiplicative component, we could give the user the option to specify whether they prefer gaussians or another function.
This morphing will allow us to get everything we need for our XFEL UC. We can make it more general for other UCs
We could also make both multiplicative and additive in the same function morph

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants